Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Update digest to support latest Nipoppy processing status files #166

Merged
merged 38 commits into from
Jan 16, 2025

Conversation

alyssadai
Copy link
Collaborator

@alyssadai alyssadai commented Dec 13, 2024

Changes proposed in this pull request:

  • Rename READMEs, schemas, and example input files to remove "bagel" to avoid confusion with files used/generated by Neurobagel / Nipoppy (input files are now just called digest files or inputs)
  • Update imaging digest schema to align with latest Nipoppy processing status file schema
    • Rename identifier columns to match Nipoppy
    • Remove columns no longer needed: PHASE__, STAGE__, HAS_DATATYPE__, HAS_IMAGE, has_mri_data
    • Make pipeline_starttime optional
    • Add columns: pipeline_step, status
    • Remove no-longer-needed IsPrefixedColumn, MissingValue column properties
    • The digest still has a few additional columns for now that may be useful in processing status files in future
    • See also Consider removing column categories in digest schemas #163
  • Use combination of pipeline-version-step instead of pipeline-version for tracking completion
    • Each pipeline-version-step combination now has distinct plots
  • Accept TSV instead of CSV inputs (Nipoppy now only produces TSVs)
  • Regenerate reference example inputs as TSVs, with column names updated according to latest schema
  • Regenerate test example files as TSVs, with column names updated according to latest schema
  • Update tests (mostly renaming column references)
  • Update pre-generated QPN file paths based on https://github.com/neurodatascience/qpn_workflows

For reviewer: you can test out the changes in the digest using either of these files:

Checklist

This section is for the PR reviewer

  • PR has an interpretable title with a prefix ([ENH], [FIX], [REF], [TST], [CI], [MNT], [INF], [MODEL], [DOC]) (see our Contributing Guidelines for more info)
  • PR links to GitHub issue with mention Closes #XXXX
  • Tests pass
  • Checks pass

For new features:

  • Tests have been added

For bug fixes:

  • There is at least one test that would fail under the original bug conditions.

- match column names to https://nipoppy.readthedocs.io/en/latest/schemas/index.html#imaging-bagel-file
- make pipeline_starttime optional
- remove "PHASE__" & "STAGE__" cols, "step" & "status" cols
- simplify column descriptions
- remove "IsPrefixedColumn"
- rename ID columns & update their descriptions
- refactor out primary session column name into a variable
@alyssadai
Copy link
Collaborator Author

Hey @michellewang, requesting your review here on just the digest schema changes as well as text including READMEs and status descriptions. Will tag you on the relevant files/sections under files changed!

README.md Show resolved Hide resolved
Copy link
Collaborator

@michellewang michellewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssadai let me know if I missed anything!

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
digest/utility.py Show resolved Hide resolved
schemas/README.md Outdated Show resolved Hide resolved
schemas/imaging_digest_schema.json Outdated Show resolved Hide resolved
schemas/imaging_digest_schema.json Outdated Show resolved Hide resolved
schemas/imaging_digest_schema.json Outdated Show resolved Hide resolved
schemas/imaging_digest_schema.json Outdated Show resolved Hide resolved
schemas/phenotypic_digest_schema.json Outdated Show resolved Hide resolved
schemas/phenotypic_digest_schema.json Show resolved Hide resolved
@surchs surchs self-requested a review January 7, 2025 21:35
Copy link
Contributor

@surchs surchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alyssadai. Changes look good, I did flag one thing I believe is an accidental change/bug with pd.read_csv to pd.read_tsv. I haven't tried this locally, but I'm pretty sure that doesn't work. Even if I'm wrong about that, I'd say add a test that covers the load_file_from_path route in utility, because it doesn't seem tested yet.

digest/utility.py Outdated Show resolved Hide resolved
digest/app.py Outdated Show resolved Hide resolved
@alyssadai
Copy link
Collaborator Author

Blocked from merging until https://github.com/neurodatascience/qpn_workflows is updated with latest QPN digest files.

@alyssadai alyssadai requested a review from surchs January 13, 2025 18:21
Copy link
Contributor

@surchs surchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff @alyssadai 🎉

🧑‍🍳

schemas/imaging_digest_schema.json Show resolved Hide resolved
@alyssadai alyssadai merged commit d6887ed into main Jan 16, 2025
2 checks passed
@alyssadai alyssadai deleted the update-proc-status-file-parsing branch January 16, 2025 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment